YouTube API Design Decisions

Learn about the technical considerations that direct the design of the YouTube API.

The API design for a streaming service is an intricate operation due to the complex nature of the system. It involves significant technical aspects—for example, the API architecture style to use between different interacting entities and the protocols adopted for transferring streaming data. In the following section, we’ll decide on the primary design considerations that we’ll stick to in designing an API for the YouTube streaming service.

Design overview#

The following illustrations show a bird's eye view of YouTube's primary services, which consist of streaming, uploading, searching, commenting, and rating (liking or disliking) services. The upload service is used to upload the video contents to the blob storage and relevant metadata to the metadata database. The search service efficiently finds relevant videos from the vast database of videos. Similarly, the comment service enables users to post comments on a video, and these can be rated via the rating service.

The various functionalities of the YouTube system
The various functionalities of the YouTube system

Since we have covered the other services in our foundational design problems, we’ll focus on streaming services, and other services relevant to streaming, in the figure below. All streaming requests coming from the clients are passed via the API gateway and directed to the relevant service, which, in turn, retrieves the relevant data from the persistent layer. For example, the ad service is responsible for handling any requests related to embedding ads in videos. Typical responsibilities of this service include communicating with other services to find optimal ads to serve specific users, choosing the number of ads to be served, and serving ads during playback.

The YouTube streaming system
The YouTube streaming system

The following table describes some of the essential components that are involved in the design of a streaming system.

Components and Services Details

Component or Service

Details

Streaming service

  • Provides video streaming service
  • Provides videos in a multi-protocol format based on the client's devices and requirements


Advertisements service

  • Updates the manifest file to decide which advertisement is suitable to be played against a video
  • Decides the interval and number of advertisements to be shown to a user
  • Directs the advertisement requests to the respective servers

Encoding service

  • Performs encoding, transcoding, and segmentation of videos after upload

User data service

  • Provides user data to other services to improve user experience


API gateway

  • Fanout requests to the appropriate services
  • Performs the identity and access management (IAM) operations
  • Throttles requests and caches frequent API responses

Databases

  • Stores users’ data, video metadata, and so on

Blob storage

  • Stores audios and videos, preferably in segments/chunks

Point to Ponder

Question

Is the manifest file generated statically during video processing or dynamically depending on the requests of individual users?

Hide Answer

Manifest files can be generated statically or dynamically, depending upon the requirements and choice of video streaming services. It also depends on the complexity of the streamed content.

Static (one-time) manifest files are pre-generated and delivered to clients as soon as the streaming session begins. This means that the same manifest file is shared with all the users requesting the stream.

In contrast, dynamic manifest files are generated on-the-fly as soon as the request is placed on the server. Each generated manifest file takes into consideration the client’s location, device specifications, network conditions, preferences, etc., to generate a tailored response.

Although dynamic manifest files produce a better user experience and are suitable for dynamic ad insertion, their generation is resource-intensive and complex as compared to the static approach.

Workflow#

The video streaming process consists of the following two steps:

  • Publishing

  • Streaming

Publishing#

The video publishing process begins by first capturing a video and uploading it to the back-end storage. During this process, complex operations are performed over raw data, such as the encoding, compression, and segmentation of videos and their associated audio files. Moreover, the segments are stored in multiple formats to provide flexibility and an uninterrupted streaming experience to end users. These video segments are later sent to edge servers, like CDN, for the viral and most-watched videos, providing a better user experience.

Video publishing process
Video publishing process

Streaming#

When a user requests a video, the streaming server sends a manifest file to the client before the video. Using the manifest file, the client fetches video segments with different bitrates based on the network condition and type of the device. The manifest file also includes advertisement information that is played at different intervals during a video via the advertisement service.

A single video consists of many segments; therefore, the client sends multiple requests for each video segment. The segments are generated with different bitrates in the publishing stage; therefore, the streaming service (CDN, in the illustration below) supports different devices and network conditions. On the client side, the video is decoded, decompressed, and played in a video player.

Note that we use CDN in the illustration below to show that many clients requesting the same stream can be served simultaneously via CDN.

Video streaming process
Video streaming process

At this point, we are now aware of how YouTube provides streaming by integrating various services within its architecture. In the following section, we decide on some important design issues that direct the design of the YouTube API in the next lesson.

Point to Ponder

Question

What happens if a user pauses a video, goes away for an extended time, comes back, and plays it again?

Hide Answer

Since streaming is accomplished through a reliable transmission control protocol (TCP) connection, it will be terminated after the user goes away for a while. When a user resumes playback, only the buffered media will be played, after which the client playback device will request to establish a new TCP connection and request the next chunk/segment in the sequence.

Design considerations#

In the YouTube design, we have three entities interacting with each other: the client, the API gateway, and the back-end services. Let's decide on the API architecture styles between them. Next, we describe the data formats and the HTTP versions suitable to adopt in the design of the YouTube API.

Architecture styles#

Considering the different architectural styles, let's expand the interaction between the client, the API gateway, and the back-end services.

Client to API gateway: In streaming, the primary task is to retrieve a video; therefore, we only perform the read operation on a resource, which is, in this case, a video. Since retrieving a video is a subset of the CRUD operations, this naturally fits the REST API architecture style. So, here, we acquire the REST style without embedding any additional complexities amid the interacting entities.

REST API architecture style for the interaction between the client and API gateway
REST API architecture style for the interaction between the client and API gateway

API gateway to back-end servers: The client requests, such as streaming, advertisement, or any other, are dynamically routed by the API gateway to the respective servers.

Each video has an associated group of attributes to be retrieved along with it. However, depending on the query parameters in the request, some attributes require filtering. Filtering helps avoid unnecessary fetching, parsing, and data storage to make the process time-efficient. At first glance, we’d need GraphQL to perform such data fetching from different services. However, the filtering operation, in this case, is performed by the individual services and not GraphQL. Therefore, an additional layer of GraphQL between the API gateway and back-end servers could bring additional complexity and affect the performance.

Hence, keeping the CRUD operations on videos and filtering attributes in view, we employ the REST architecture style between the API gateway and back-end services.

REST API architecture style between the API gateway and backend services
REST API architecture style between the API gateway and backend services

HTTP version#

Although support for HTTP streaming was added in HTTP/1.1, it is relatively slower than HTTP/2.0. This is because HTTP/2.0 works on binary data, which is compressed during transmission and, in turn, reduces the file size to be sent. Similarly, HTTP/2.0 provides multiplexing, which turns out to be a better choice for sending multiple streams of content (audio/video segments, subtitles, etc.) over a single TCP connection without head of line (HOL) blocking (that is present in HTTP/1.1). Moreover, HTTP/2.0 is also resilient to network failures. Therefore, we acquired HTTP/2.0 in the design of our API. While HTTP/3.0 can be considered a good option, it is yet to be widely adopted to be compatible with different devices over the Internet infrastructure.

Note: YouTube uses the QUIC protocol to retrieve video and audio quickly in different streams. Infact, QUIC was developed by Google, and YouTube was among the early adopters. It is supported in all YouTube mobile applications across different platforms.

Point to Ponder

Question

Can we use HTTP/1.1 for streaming a video?

Hide Answer

Yes, we can use HTTP/1.1 for streaming by setting the Keep-Alive header to true. Via this header, the server will keep the connection open until the video finishes or the client informs the server to terminate the connection. However, this will cause extra delays in communication due to the lack of multiplexing (that might be needed to fetch more than one segment in parallel to improve a user’s experience) and compression.

Data formats#

Since the audio and video data require efficient communication between the client and API gateway, it is imperative to use the binary format for sending such data. Our choice of HTTP/2.0 further helps the cause since it compresses the data before transmission.

However, the selection of the data format for metadata should be client-dependent. This is because some clients, such as mobile applications, do not offer web developer tools. In that case, it is optimal to use binary formats for efficiency. For clients such as browsers, the JSON format is a good option for metadata because human readability and configuration make debugging easier.

Note: The video and audio are encoded in byte streams compressed with a supported compression algorithm—for example, gzip, deflate, or br.

Summary#

In this lesson, we discussed the workflow of a streaming service. Next, we decided on architecture styles between the interacting entities: REST for the client to API gateway and the API gateway to back-end services. Furthermore, we chose HTTP/2.0 as the application layer protocol due to its advantages over other HTTP versions for streaming services. We also adopted the binary format for sending or receiving audio and video data and the JSON format for the metadata.

Design Considerations

Client to API Gateway

API Gateway to Back-end Services

API architecture style

REST

REST

HTTP version

HTTP/2.0

HTTP/2.0

Data format

Binary for media data while JSON for metadata

Binary for media data while JSON for metadata

Introduction to Video Streaming

API Model for YouTube Service